Hierarchical Transformer Network for Utterance-Level Emotion Recognition
نویسندگان
چکیده
منابع مشابه
Improved Emotion Recognition with Novel Global Utterance-level Features
Traditional features, which are extracted from each frame, can not reflect the dynamic characteristics of emotion speech signal accurately. To solve this problem, first, without dividing the emotion speech into frames, novel global utterance-level features are proposed with multi-scale optimal wavelet packet decomposition; then, considering the case of little training samples, a fusion strategy...
متن کاملHierarchical Spatial Transformer Network
Computer vision researchers have been expecting that neural networks have spatial transformation ability to eliminate the interference caused by geometric distortion for a long time. Emergence of spatial transformer network makes dream come true. Spatial transformer network and its variants can handle global displacement well, but lack the ability to deal with local spatial variance. Hence how ...
متن کاملUtterance independent bimodal emotion recognition in spontaneous communication
Emotion expressions sometimes are mixed with the utterance expression in spontaneous face-to-face communication, which makes difficulties for emotion recognition. This article introduces the methods of reducing the utterance influences in visual parameters for the audio-visual-based emotion recognition. The audio and visual channels are first combined under a Multistream Hidden Markov Model (MH...
متن کاملClass-level spectral features for emotion recognition
The most common approaches to automatic emotion recognition rely on utterance level prosodic features. Recent studies have shown that utterance level statistics of segmental spectral features also contain rich information about expressivity and emotion. In our work we introduce a more fine-grained yet robust set of spectral features: statistics of Mel-Frequency Cepstral Coefficients computed ov...
متن کاملMulti-class and hierarchical SVMs for emotion recognition
This paper extends binary support vector machines to multiclass classification for recognising emotions from speech. We apply two standard schemes (one-versus-one and one-versusrest) and two schemes that form a hierarchy of classifiers each making a distinct binary decision about class membership, on three publicly-available databases. Using the OpenEAR toolkit to extract more than 6000 feature...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied Sciences
سال: 2020
ISSN: 2076-3417
DOI: 10.3390/app10134447